{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "550edc01b00149cd",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "# Zep\n",
    "> Recall, understand, and extract data from chat histories. Power personalized AI experiences.\n",
    "\n",
    "> [Zep](https://www.getzep.com) is a long-term memory service for AI Assistant apps.\n",
    "> With Zep, you can provide AI assistants with the ability to recall past conversations, no matter how distant,\n",
    "> while also reducing hallucinations, latency, and cost.\n",
    "\n",
    "> Interested in Zep Cloud? See [Zep Cloud Installation Guide](https://help.getzep.com/sdks) and [Zep Cloud Vector Store example](https://help.getzep.com/langchain/examples/vectorstore-example)\n",
    "\n",
    "## Open Source Installation and Setup\n",
    "\n",
    "> Zep Open Source project: [https://github.com/getzep/zep](https://github.com/getzep/zep)\n",
    ">\n",
    "> Zep Open Source Docs: [https://docs.getzep.com/](https://docs.getzep.com/)\n",
    "\n",
    "You'll need to install `langchain-community` with `pip install -qU langchain-community` to use this integration\n",
    "\n",
    "## Usage\n",
    "\n",
    "In the examples below, we're using Zep's auto-embedding feature which automatically embeds documents on the Zep server \n",
    "using low-latency embedding models.\n",
    "\n",
    "## Note\n",
    "- These examples use Zep's async interfaces. Call sync interfaces by removing the `a` prefix from the method names.\n",
    "- If you pass in an `Embeddings` instance Zep will use this to embed documents rather than auto-embed them.\n",
    "You must also set your document collection to `isAutoEmbedded === false`. \n",
    "- If you set your collection to `isAutoEmbedded === false`, you must pass in an `Embeddings` instance."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9a3a11aab1412d98",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "## Load or create a Collection from documents"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "519418421a32e4d",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-08-13T01:07:50.672390Z",
     "start_time": "2023-08-13T01:07:48.777799Z"
    },
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "from uuid import uuid4\n",
    "\n",
    "from langchain_community.document_loaders import WebBaseLoader\n",
    "from langchain_community.vectorstores import ZepVectorStore\n",
    "from langchain_community.vectorstores.zep import CollectionConfig\n",
    "from langchain_text_splitters import RecursiveCharacterTextSplitter\n",
    "\n",
    "ZEP_API_URL = \"http://localhost:8000\"  # this is the API url of your Zep instance\n",
    "ZEP_API_KEY = \"<optional_key>\"  # optional API Key for your Zep instance\n",
    "collection_name = f\"babbage{uuid4().hex}\"  # a unique collection name. alphanum only\n",
    "\n",
    "# Collection config is needed if we're creating a new Zep Collection\n",
    "config = CollectionConfig(\n",
    "    name=collection_name,\n",
    "    description=\"<optional description>\",\n",
    "    metadata={\"optional_metadata\": \"associated with the collection\"},\n",
    "    is_auto_embedded=True,  # we'll have Zep embed our documents using its low-latency embedder\n",
    "    embedding_dimensions=1536,  # this should match the model you've configured Zep to use.\n",
    ")\n",
    "\n",
    "# load the document\n",
    "article_url = \"https://www.gutenberg.org/cache/epub/71292/pg71292.txt\"\n",
    "loader = WebBaseLoader(article_url)\n",
    "documents = loader.load()\n",
    "\n",
    "# split it into chunks\n",
    "text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)\n",
    "docs = text_splitter.split_documents(documents)\n",
    "\n",
    "# Instantiate the VectorStore. Since the collection does not already exist in Zep,\n",
    "# it will be created and populated with the documents we pass in.\n",
    "vs = ZepVectorStore.from_documents(\n",
    "    docs,\n",
    "    collection_name=collection_name,\n",
    "    config=config,\n",
    "    api_url=ZEP_API_URL,\n",
    "    api_key=ZEP_API_KEY,\n",
    "    embedding=None,  # we'll have Zep embed our documents using its low-latency embedder\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "201dc57b124cb6d7",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-08-13T01:07:53.807663Z",
     "start_time": "2023-08-13T01:07:50.671241Z"
    },
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Embedding status: 0/401 documents embedded\n",
      "Embedding status: 0/401 documents embedded\n",
      "Embedding status: 0/401 documents embedded\n",
      "Embedding status: 0/401 documents embedded\n",
      "Embedding status: 0/401 documents embedded\n",
      "Embedding status: 0/401 documents embedded\n",
      "Embedding status: 401/401 documents embedded\n"
     ]
    }
   ],
   "source": [
    "# wait for the collection embedding to complete\n",
    "\n",
    "\n",
    "async def wait_for_ready(collection_name: str) -> None:\n",
    "    import time\n",
    "\n",
    "    from zep_python import ZepClient\n",
    "\n",
    "    client = ZepClient(ZEP_API_URL, ZEP_API_KEY)\n",
    "\n",
    "    while True:\n",
    "        c = await client.document.aget_collection(collection_name)\n",
    "        print(\n",
    "            \"Embedding status: \"\n",
    "            f\"{c.document_embedded_count}/{c.document_count} documents embedded\"\n",
    "        )\n",
    "        time.sleep(1)\n",
    "        if c.status == \"ready\":\n",
    "            break\n",
    "\n",
    "\n",
    "await wait_for_ready(collection_name)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "94ca9dfa7d0ecaa5",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "## Simarility Search Query over the Collection"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "1998de0a96fe89c3",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-08-13T01:07:54.195988Z",
     "start_time": "2023-08-13T01:07:53.808550Z"
    },
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "the positions of the two principal planets, (and these the most\n",
      "necessary for the navigator,) Jupiter and Saturn, require each not less\n",
      "than one hundred and sixteen tables. Yet it is not only necessary to\n",
      "predict the position of these bodies, but it is likewise expedient to\n",
      "tabulate the motions of the four satellites of Jupiter, to predict the\n",
      "exact times at which they enter his shadow, and at which their shadows\n",
      "cross his disc, as well as the times at which they are interposed  ->  0.9003241539387915 \n",
      "====\n",
      "\n",
      "furnish more than a small fraction of that aid to navigation (in the\n",
      "large sense of that term), which, with greater facility, expedition, and\n",
      "economy in the calculation and printing of tables, it might be made to\n",
      "supply.\n",
      "\n",
      "Tables necessary to determine the places of the planets are not less\n",
      "necessary than those for the sun, moon, and stars. Some notion of the\n",
      "number and complexity of these tables may be formed, when we state that  ->  0.8911165633479508 \n",
      "====\n",
      "\n",
      "the scheme of notation thus applied, immediately suggested the\n",
      "advantages which must attend it as an instrument for expressing the\n",
      "structure, operation, and circulation of the animal system; and we\n",
      "entertain no doubt of its adequacy for that purpose. Not only the\n",
      "mechanical connexion of the solid members of the bodies of men and\n",
      "animals, but likewise the structure and operation of the softer parts,\n",
      "including the muscles, integuments, membranes, &c. the nature, motion,  ->  0.8899750214770481 \n",
      "====\n"
     ]
    }
   ],
   "source": [
    "# query it\n",
    "query = \"what is the structure of our solar system?\"\n",
    "docs_scores = await vs.asimilarity_search_with_relevance_scores(query, k=3)\n",
    "\n",
    "# print results\n",
    "for d, s in docs_scores:\n",
    "    print(d.page_content, \" -> \", s, \"\\n====\\n\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e02b61a9af0b2c80",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "## Search over Collection Re-ranked by MMR\n",
    "\n",
    "Zep offers native, hardware-accelerated MMR re-ranking of search results."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "488112da752b1d58",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-08-13T01:07:54.394873Z",
     "start_time": "2023-08-13T01:07:54.180901Z"
    },
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "the positions of the two principal planets, (and these the most\n",
      "necessary for the navigator,) Jupiter and Saturn, require each not less\n",
      "than one hundred and sixteen tables. Yet it is not only necessary to\n",
      "predict the position of these bodies, but it is likewise expedient to\n",
      "tabulate the motions of the four satellites of Jupiter, to predict the\n",
      "exact times at which they enter his shadow, and at which their shadows\n",
      "cross his disc, as well as the times at which they are interposed \n",
      "====\n",
      "\n",
      "the scheme of notation thus applied, immediately suggested the\n",
      "advantages which must attend it as an instrument for expressing the\n",
      "structure, operation, and circulation of the animal system; and we\n",
      "entertain no doubt of its adequacy for that purpose. Not only the\n",
      "mechanical connexion of the solid members of the bodies of men and\n",
      "animals, but likewise the structure and operation of the softer parts,\n",
      "including the muscles, integuments, membranes, &c. the nature, motion, \n",
      "====\n",
      "\n",
      "resistance, economizing time, harmonizing the mechanism, and giving to\n",
      "the whole mechanical action the utmost practical perfection.\n",
      "\n",
      "The system of mechanical contrivances by which the results, here\n",
      "attempted to be described, are attained, form only one order of\n",
      "expedients adopted in this machinery;--although such is the perfection\n",
      "of their action, that in any ordinary case they would be regarded as\n",
      "having attained the ends in view with an almost superfluous degree of \n",
      "====\n"
     ]
    }
   ],
   "source": [
    "query = \"what is the structure of our solar system?\"\n",
    "docs = await vs.asearch(query, search_type=\"mmr\", k=3)\n",
    "\n",
    "for d in docs:\n",
    "    print(d.page_content, \"\\n====\\n\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "42455e31d4ab0d68",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "# Filter by Metadata\n",
    "\n",
    "Use a metadata filter to narrow down results. First, load another book: \"Adventures of Sherlock Holmes\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "146c8a96201c0ab9",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-08-13T01:08:06.323569Z",
     "start_time": "2023-08-13T01:07:54.381822Z"
    },
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Embedding status: 401/1691 documents embedded\n",
      "Embedding status: 401/1691 documents embedded\n",
      "Embedding status: 401/1691 documents embedded\n",
      "Embedding status: 401/1691 documents embedded\n",
      "Embedding status: 401/1691 documents embedded\n",
      "Embedding status: 401/1691 documents embedded\n",
      "Embedding status: 901/1691 documents embedded\n",
      "Embedding status: 901/1691 documents embedded\n",
      "Embedding status: 901/1691 documents embedded\n",
      "Embedding status: 901/1691 documents embedded\n",
      "Embedding status: 901/1691 documents embedded\n",
      "Embedding status: 901/1691 documents embedded\n",
      "Embedding status: 1401/1691 documents embedded\n",
      "Embedding status: 1401/1691 documents embedded\n",
      "Embedding status: 1401/1691 documents embedded\n",
      "Embedding status: 1401/1691 documents embedded\n",
      "Embedding status: 1691/1691 documents embedded\n"
     ]
    }
   ],
   "source": [
    "# Let's add more content to the existing Collection\n",
    "article_url = \"https://www.gutenberg.org/files/48320/48320-0.txt\"\n",
    "loader = WebBaseLoader(article_url)\n",
    "documents = loader.load()\n",
    "\n",
    "# split it into chunks\n",
    "text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)\n",
    "docs = text_splitter.split_documents(documents)\n",
    "\n",
    "await vs.aadd_documents(docs)\n",
    "\n",
    "await wait_for_ready(collection_name)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5b225f3ae1e61de8",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "We see results from both books. Note the `source` metadata"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "53700a9cd817cde4",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-08-13T01:08:06.504769Z",
     "start_time": "2023-08-13T01:08:06.325435Z"
    },
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "or remotely, for this purpose. But in addition to these, a great number\n",
      "of tables, exclusively astronomical, are likewise indispensable. The\n",
      "predictions of the astronomer, with respect to the positions and motions\n",
      "of the bodies of the firmament, are the means, and the only means, which\n",
      "enable the mariner to prosecute his art. By these he is enabled to\n",
      "discover the distance of his ship from the Line, and the extent of his  ->  {'source': 'https://www.gutenberg.org/cache/epub/71292/pg71292.txt'} \n",
      "====\n",
      "\n",
      "possess all knowledge which is likely to be useful to him in his work,\n",
      "and this I have endeavored in my case to do. If I remember rightly, you\n",
      "on one occasion, in the early days of our friendship, defined my limits\n",
      "in a very precise fashion.”\n",
      "\n",
      "“Yes,” I answered, laughing. “It was a singular document. Philosophy,\n",
      "astronomy, and politics were marked at zero, I remember. Botany\n",
      "variable, geology profound as regards the mud-stains from any region  ->  {'source': 'https://www.gutenberg.org/files/48320/48320-0.txt'} \n",
      "====\n",
      "\n",
      "of astronomy, and its kindred sciences, with the various arts dependent\n",
      "on them. In none are computations more operose than those which\n",
      "astronomy in particular requires;--in none are preparatory facilities\n",
      "more needful;--in none is error more detrimental. The practical\n",
      "astronomer is interrupted in his pursuit, and diverted from his task of\n",
      "observation by the irksome labours of computation, or his diligence in\n",
      "observing becomes ineffectual for want of yet greater industry of  ->  {'source': 'https://www.gutenberg.org/cache/epub/71292/pg71292.txt'} \n",
      "====\n"
     ]
    }
   ],
   "source": [
    "query = \"Was he interested in astronomy?\"\n",
    "docs = await vs.asearch(query, search_type=\"similarity\", k=3)\n",
    "\n",
    "for d in docs:\n",
    "    print(d.page_content, \" -> \", d.metadata, \"\\n====\\n\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7b81d7cae351a1ec",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "Now, we set up a filter"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "8f1bdcba03979d22",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-08-13T01:08:06.672836Z",
     "start_time": "2023-08-13T01:08:06.505944Z"
    },
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "possess all knowledge which is likely to be useful to him in his work,\n",
      "and this I have endeavored in my case to do. If I remember rightly, you\n",
      "on one occasion, in the early days of our friendship, defined my limits\n",
      "in a very precise fashion.”\n",
      "\n",
      "“Yes,” I answered, laughing. “It was a singular document. Philosophy,\n",
      "astronomy, and politics were marked at zero, I remember. Botany\n",
      "variable, geology profound as regards the mud-stains from any region  ->  {'source': 'https://www.gutenberg.org/files/48320/48320-0.txt'} \n",
      "====\n",
      "\n",
      "the light shining upon his strong-set aquiline features. So he sat as I\n",
      "dropped off to sleep, and so he sat when a sudden ejaculation caused me\n",
      "to wake up, and I found the summer sun shining into the apartment. The\n",
      "pipe was still between his lips, the smoke still curled upward, and the\n",
      "room was full of a dense tobacco haze, but nothing remained of the heap\n",
      "of shag which I had seen upon the previous night.\n",
      "\n",
      "“Awake, Watson?” he asked.\n",
      "\n",
      "“Yes.”\n",
      "\n",
      "“Game for a morning drive?”  ->  {'source': 'https://www.gutenberg.org/files/48320/48320-0.txt'} \n",
      "====\n",
      "\n",
      "“I glanced at the books upon the table, and in spite of my ignorance\n",
      "of German I could see that two of them were treatises on science, the\n",
      "others being volumes of poetry. Then I walked across to the window,\n",
      "hoping that I might catch some glimpse of the country-side, but an oak\n",
      "shutter, heavily barred, was folded across it. It was a wonderfully\n",
      "silent house. There was an old clock ticking loudly somewhere in the\n",
      "passage, but otherwise everything was deadly still. A vague feeling of  ->  {'source': 'https://www.gutenberg.org/files/48320/48320-0.txt'} \n",
      "====\n"
     ]
    }
   ],
   "source": [
    "filter = {\n",
    "    \"where\": {\n",
    "        \"jsonpath\": (\n",
    "            \"$[*] ? (@.source == 'https://www.gutenberg.org/files/48320/48320-0.txt')\"\n",
    "        )\n",
    "    },\n",
    "}\n",
    "\n",
    "docs = await vs.asearch(query, search_type=\"similarity\", metadata=filter, k=3)\n",
    "\n",
    "for d in docs:\n",
    "    print(d.page_content, \" -> \", d.metadata, \"\\n====\\n\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "96132aa6",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
